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Abstract 

Two  key  issues  for  induction  algorithms  are  the  accu¬ 
racy  of  the  learned  hypothesis  and  the  computational 
resources  consumed  in  inducing  that  hypothesis.  One 
of  the  most  promising  ways  to  improve  performance 
along  both  dimensions  is  to  make  use  of  additional 
knowledge.  Multi-strategy  learning  algorithms  tackle 
this  problem  by  employing  several  strategies  for  han¬ 
dling  different  kinds  of  knowledge  in  different  ways. 
However,  integrating  knowledge  into  an  induction  al¬ 
gorithm  can  be  difficult  when  the  new  knowledge  dif¬ 
fers  significantly  from  the  knowledge  the  algorithm 
already  uses.  In  many  cases  the  algorithm  must  be 
rewritten. 

This  paper  presents  KII,  a  Knowledge  Integration 
fr2unework  for  Induction,  that  provides  a  uniform 
mechanism  for  integrating  knowledge  into  induction. 

In  theory,  arbitrary  knowledge  can  be  integrated  with 
this  mechanism,  but  in  practice  the  knowledge  rep¬ 
resentation  language  determines  both  the  knowledge 
that  can  be  integrated,  and  the  costs  of  integration 
and  induction.  By  instantiating  KII  with  various  set 
representations,  algorithms  can  be  generated  at  differ¬ 
ent  trade-off  points  along  these  dimensions. 

One  instantiation  of  KII,  called  RS-KII,  is  presented 
that  can  implement  hybrid  induction  algorithms,  de¬ 
pending  on  which  knowledge  it  utilizes.  RS-KII  is 
demonstrated  to  implement  AQ-11  (Michalski  1978), 
as  well  as  a  hybrid  algorithm  that  utilizes  a  domain 
theory  and  noisy  examples.  Other  algorithms  axe  also 
possible. 

Introduction 

Two  key  criteria  for  evaluating  induction  algorithms 
are  the  accuracy  of  the  induced  hypothesis  and  the 
computational  cost  of  inducing  that  hypothesis.  One  of 
the  most  powerful  ways  to  achieve  improvements  along 
both  of  these  dimensions  is  by  integrating  additional 
knowledge  into  the  induction  process.  Knowledge  con¬ 
sists  of  examples,  domain  theories,  heuristics,  and  any 
other  information  that  affects  which  hypothesis  is  in¬ 
duced — ^that  is,  knowledge  is  examples  plus  biases. 

A  given  single-strategy  learning  algorithm  can  uti¬ 
lize  some  knowledge  very  effectively,  others  less  effec¬ 
tively,  and  some  knowledge  not  at  ^1.  By  using  multi¬ 
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pie  strategies,  an  induction  algorithm  can  make  more 
effective  use  of  a  wider  range  of  knowledge,  thereby  im¬ 
proving  performance.  However,  even  a  multi-strategy 
learning  algorithm  can  only  make  use  of  knowledge  for 
which  its  strategies  are  designed. 

In  order  to  utilize  new  kinds  of  knowledge,  the 
knowledge  must  either  be  recast  as  a  kind  for  which 
the  algorithm  already  has  a  strategy — for  example,  in¬ 
tegrating  type  constraints  into  FOIL  by  casting  them 
as  pseudo  negative  examples  (Quinlan  1990) — or  the 
algorithm  must  be  rewritten  to  take  advantage  of  the 
new  knowledge  by  adding  a  new  strategy  or  modifying 
an  existing  one.  The  first  approach — recasting  knowl¬ 
edge — is  limited  by  the  expressiveness  of  the  knowledge 
already  used  by  the  algorithm.  If  the  new  knowledge 
cannot  be  expressed  in  terms  of  the  existing  kinds  of 
knowledge,  then  the  new  knowledge  cannot  be  utilized. 
The  second  approach — rewriting  an  algorithm  to  uti¬ 
lize  a  new  kind  of  knowledge — is  difficult.  It  also  fails 
to  solve  the  underlying  problem — if  yet  another  kind  of 
knowledge  is  made  available,  the  algorithm  may  have 
to  be  modified  once  again. 

What  is  needed  is  an  easier  way  to  integrate  knowl¬ 
edge  into  induction.  One  approach  for  doing  this  ex¬ 
ploits  the  observation  that  a  knowledge  fragment  plus 
a  strategy  for  using  that  knowledge  constitutes  a  bias, 
since  together  they  determine  which  hypothesis  is  in¬ 
duced.  These  biases  can  be  expressed  uniformly  in 
terms  of  constraints  and  preferences  on  the  hypothesis 
space.  The  induced  hypothesis  is  the  most  preferred 
hypothesis  among  those  that  satisfy  the  constraints. 
New  knowledge  and  strategies  are  integrated  into  in¬ 
duction  by  combining  their  constraints  and  preferences 
with  those  previously  integrated. 

This  approach  is  formalized  in  a  framework  called 
KII.  This  framework  represents  constraints  and  pref¬ 
erences  as  sets,  and  provides  set-based  operations  for 
integrating  knowledge  expressed  in  this  way,  and  for 
inducing  hypotheses  from  the  integrated  knowledge. 
Converting  toowledge  into  constraints  and  preferences 
is  handled  by  translators  (Cohen  1992),  which  are  writ¬ 
ten  by  the  user  for  each  knowledge  fragment,  or  class 
of  related  knowledge  fragments. 


Since  KII  is  defined  in  terms  of  sets  and  set  oper¬ 
ations,  some  set  representation  must  be  specified  in 
order  for  KII  to  be  operational.  The  set  representa¬ 
tion  determines  the  l^ds  of  knowledge  that  can  be 
expressed,  and  also  determines  the  computational  com¬ 
plexity  of  integration  and  induction.  Each  set  repre¬ 
sentation  yields  an  instantiation  of  KII  at  a  different 
trade-off  point  between  expressiveness  and  computa¬ 
tional  complexity. 

This  approach  is  most  similar  to  that  of  Russell  and 
Grosof  (Russell  &  Grosof  1987),  in  which  biases  are 
represented  as  determinations,  and  the  hypothesis  is 
deduced  from  the  determinations  and  examples  by  a 
theorem  prover.  As  in  KII,  the  inductive  leaps  come 
from  biases,  which  may  be  grounded  in  supposition  in¬ 
stead  of  fact.  A  major  difference  between  this  system 
and  KII  is  KIFs  ability  to  select  different  set  represen¬ 
tations,  which  allows  different  trade-offs  to  be  made 
between  expressiveness  and  cost.  Determinations,  by 
contrast,  are  at  a  fixed  trade-off  point,  although  one 
could  imagine  using  restricted  logics. 

One  advantage  of  KIPs  formal  relationship  between 
the  set  representation  and  the  cost /expressiveness 
trade-off  is  that  it  allows  formal  analysis  of  these  trade¬ 
offs.  In  particular,  an  upper  limit  can  be  established  on 
the  expressiveness  of  the  set  representations  for  which 
induction  is  even  computable.  This  sets  a  practical 
limit  on  the  kinds  of  knowledge  that  can  be  utilized  by 
induction. 

Among  the  set  representations  below  this  limit,  there 
are  a  number  that  generate  useful  instantiations  of 
KII.  Most  notably,  Incremental  Version  Space  Merg¬ 
ing  (Hirsh  1990)  can  be  generated  by  using  a  boundary 
set  representation  for  constraints  (i.e.,  version  spaces), 
and  an  empty  representation  for  preferences;  and  an 
algorithm  similar  to  Grendel  (Cohen  1992)  can  be  in¬ 
stantiated  from  KII  by  representing  sets  as  antecedent 
description  grammars  (essentially  context  free  gram¬ 
mars).  These  will  be  discussed  briefly.  A  new  algo¬ 
rithm,  RS-KII,  is  instantiated  from  KII  by  represent¬ 
ing  sets  as  regular  grammars.  This  algorithm  seems 
to  strike  a  good  balance  between  expressiveness  and 
complexity. 

RS-KII  can  use  a  wide  range  of  knowledge,  and  com¬ 
bine  this  knowledge  in  a  number  of  ways.  This  makes 
it  a  good  multi-strategy  algorithm.  RS-KII  can  use 
the  knowledge  and  strategies  of  at  least  two  exist¬ 
ing  algorithms,  the  Candidate  Elimination  Algorithm 
(Mitchell  1982)  and  AQ-11  with  a  beam  width  of  one 
(Michalski  1978),  It  can  also  utilize  additional  knowl¬ 
edge,  such  as  a  domain  theory  and  noisy  examples. 
Although  space  limits  us  from  discussing  all  of  these 
in  detail,  the  translators  needed  to  implement  AQ-11 
are  demonstrated,  as  well  as  those  for  the  domain  the¬ 
ory  and  noisy  examples.  When  utilizing  only  the  AQ- 
11  knowledge,  RS-KII  induces  the  same  hypotheses  as 
AQ-11  with  a  beam  width  of  one,  with  a  computational 
complexity  that  is  only  a  little  worse.  When  RS-KII 


utilizes  the  translators  for  the  additional  knowledge, 
RS-KII  induces  a  more  accurate  hypothesis  than  AQ- 
11,  and  in  much  less  time.  RS-KII  looks  able  to  ex¬ 
press  and  integrate  other  common  knowledge  sources 
and  strategies  as  well,  though  this  is  an  area  for  future 
research. 

The  Knowledge  Integration  Framework 

This  section  formally  describes  KII,  a  Knowledge  In¬ 
tegration  Framework  for  Induction.  The  combination 
of  a  knowledge  fragment  and  a  strategy  for  using  that 
knowledge  can  be  considered  a  bias,  which  is  expressed 
in  terms  of  constraints  and  preferences  over  the  hy¬ 
pothesis  space.  For  instance,  a  positive  example  and 
a  strategy  that  assumes  the  target  concept  is  strictly 
consistent  with  the  examples,  would  be  translated  as 
a  constraint  that  is  satisfied  only  by  hypotheses  that 
cover  the  example.  A  strategy  that  assumed  noisy  ex¬ 
amples  might  be  expressed  as  a  preference  for  hypothe¬ 
ses  that  were  most  consistent  with  the  example,  but 
does  not  reject  inconsistent  hypotheses  outright. 

The  biases  are  integrated  into  a  single  composite  bias 
by  combining  their  respective  constraints  and  prefer¬ 
ences.  The  composite  bias,  which  includes  the  exam¬ 
ples,  wholly  determines  the  selection  of  the  induced 
hypothesis.  If  there  are  several  hypotheses  which  the 
bias  finds  equally  acceptable,  any  one  may  be  selected 
arbitrarily  as  the  target  concept.  This  set  is  called 
the  solution  set  In  this  view,  integration  precedes  in¬ 
duction,  rather  than  being  part  of  it.  This  separation 
makes  it  easier  to  integrate  knowledge  into  induction, 
since  the  effects  of  each  process  are  clearer. 

KII  formalizes  these  ideas  as  follows.  Each  bias  is 
expressed  as  a  triple  of  three  sets,  (ff,  C,  P),  where  H 
is  the  hypothesis  space,  C  is  the  set  of  hypotheses  that 
satisfy  the  constraints  of  all  the  biases,  and  P  is  a  set 
of  hypothesis  pairs,  {x^y),  such  that  x  is  less  preferred 
than  y  by  at  least  one  of  the  biases.  The  solution  set, 
from  which  the  induced  hypothesis  is  selected  arbitrar¬ 
ily,  is  the  set  of  most  preferred  hypothesis  among  those 
that  satisfy  the  constraints — namely,  the  hypotheses  in 
C  for  which  no  other  hypothesis  in  C  is  preferable,  ac¬ 
cording  to  P.  Formally,  {a;  €  C  |  E  C  (a;,y)  ^  P}. 

KII  provides  several  operations  on  knowledge  ex¬ 
pressed  in  this  representation:  translation,  integra¬ 
tion,  induction  (selecting  a  hypothesis  from  the  solu¬ 
tion  set),  and  solution  set  queries.  These  operations, 
as  well  as  the  solution  set  itself,  are  defined  in  terms 
of  set  operations  on  H,  C,  and  P.  These  operators  are 
described  in  detail  below. 

Translation  Knowledge  is  converted  from  the  form 
in  which  it  occurs  (its  naturalistic  representation 
(Rosenbloom  et  at  1993))  into  {H,C,P)  triples  by 
translators  (Cohen  1992).  Since  knowledge  is  trans¬ 
lated  into  constraints  and  preferences  over  the  hypoth¬ 
esis  space,  the  implementation  of  each  translator  de¬ 
pends  on  both  the  hypothesis  space  and  the  knowl- 


edge.  In  the  worst  case,  a  different  implementation  is 
required  for  each  pair  of  knowledge  fragment  and  hy¬ 
pothesis  space.  Since  there  are  a  potentially  infinite 
number  of  translators,  they  are  not  provided  as  part  of 
the  KII  formalism,  but  must  be  provided  by  the  user 
as  needed. 

Fortunately,  closely  related  pairs  of  hypothesis  space 
and  knowledge  often  have  similar  translations,  allow¬ 
ing  a  single  translator  to  be  written  for  all  of  the  pairs. 
One  such  translator,  which  will  be  described  in  de¬ 
tail  later,  takes  as  input  an  example  and  a  hypothesis 
space.  The  example  can  be  any  member  of  the  instance 
space,  and  the  hypothesis  space  is  selected  from  a  fam¬ 
ily  of  languages  by  specifying  the  set  of  features.  The 
same  translator  works  for  every  pair  of  example  and 
hypothesis  language  in  this  space. 

Integration  Translated  knowledge  fragments  are  in¬ 
tegrated  by  composing  their  {H,C,P)  triples.  A  hy¬ 
pothesis  can  only  be  the  induced  hypothesis  if  it  is  ac¬ 
cepted  by  the  constraints  of  all  of  the  knowledge  frag¬ 
ments,  and  if  the  combined  preferences  of  the  knowl¬ 
edge  fragments  do  not  prefer  some  other  hypothesis. 
That  is,  the  induced  hypothesis  must  satisfy  the  con¬ 
junction  of  the  constraints,  and  be  preferred  by  the 
disjunction  of  the  preferences.  This  reasoning  is  cap¬ 
tured  in  the  following  definition  for  the  integration  of 
two  tuples,  (if, Cl, Pi)  and  (ff,  C2,P2).  The  hypothe¬ 
sis  space  is  the  same  in  both  cases,  since  it  is  not  clear 
what  it  means  to  integrate  knowledge  about  target  hy¬ 
potheses  from  different  hypothesis  spaces. 

Integrate((P,Ci,Pi>,{if,C2,P2»  =  (P,  CinC2,  P1UP2) 

(1) 

The  integration  operator  assumes  that  the  knowl¬ 
edge  is  consistent.  That  is,  Ci  and  C2  are  not  mutu¬ 
ally  exclusive,  and  that  P1UP2  does  not  contain  cycles 
(e.g.,  a  <  b  and  b  <  a).  Although  such  knowledge 
can  be  integrated,  the  inconsistencies  will  not  be  dealt 
with  in  any  significant  fashion.  Mutually  exclusive  con¬ 
straints  will  result  in  an  empty  solution  set,  and  cycles 
are  broken  arbitrarily  by  assuming  every  element  of 
the  cycle  is  dominated.  Developing  more  sophisticated 
strategies  for  dealing  with  contradictions  is  an  area  for 
future  research. 

Although  KII  does  not  deal  with  contradictory 
knowledge,  it  can  deal  with  imcertain  knowledge.  For 
example,  noisy  examples  and  incomplete  domain  the¬ 
ories  can  both  be  utilized  in  KII.  Translators  for  these 
knowledge  sources  are  described  later. 

Induction  and  Solution  Set  Queries  The  inte¬ 
grated  knowledge  is  represented  by  a  single  tuple, 
(PT,  C,  P).  The  target  concept  is  induced  from  the  inte¬ 
grated  knowledge  by  selecting  an  arbitrary  hypothesis 
from  the  solution  set  of  (Jf,  C,P).  KII  also  supports 
queries  about  the  solution  set,  such  as  whether  it  is 
empty,  a  singleton,  contains  a  given  hypothesis,  or  is 
a  subset  of  some  other  set.  These  correspond  to  the 
operations  that  have  proven  empirically  useful  for  ver¬ 


sion  spaces  (Hirsh  1992),  which  can  be  thought  of  as 
solution  sets  for  knowledge  expressed  as  constraints. 

It  is  conjectured  that  these  four  queries  plus  the  abil¬ 
ity  to  select  a  h3pothesis  from  the  solution  set  are  suf¬ 
ficient  for  the  vast  majority  of  induction  tasks.  Most 
existing  induction  algorithms  involve  only  the  enumer¬ 
ation  operator  and  perhaps  an  Empty  or  Unique  query. 
The  Candidate  Elimination  algorithm  (Mitchell  1982) 
and  Incremental  Version  Space  Merging  (IVSM)  (Hirsh 
1990)  use  all  four  queries,  but  do  not  select  a  h3poth- 
esis  from  the  solution  set  (they  return  the  entire  set). 

The  queries  and  selection  of  a  hypothesis  from  the 
solution  set  can  be  implemented  in  terms  of  a  single 
enumeration  operator.  The  enumeration  operator  re¬ 
turns  n  elements  of  a  set,  5,  where  n  is  specified  by 
the  user.  It  is  defined  formally  as  follows. 

Enumerate{Sy n)  {/ii, ^2^  •  •  • 
where 

m  =  min(n,|5|),{/ii,/i2,.../im}  Q  S 

Normally,  S  is  the  solution  set  of  {H,C,P),  It  can 
sometimes  be  cheaper  to  compute  the  first  few  elements 
of  the  solution  set  from  {H^  C,  P)  than  to  compute  even 
the  intensional  representation  of  the  solution  set  from 
(H,C^P).  Therefore,  the  5  argument  to  the  enumera¬ 
tion  operator  can  be  either  a  (iJ,  C,  P)  tuple,  or  a  set 
expression  involving  an  (P,  C,  P)  tuple  and  other  sets. 
This  allows  the  enumeration  operator  to  use  whatever 
optimizations  seem  appropriate.  A  different  implemen¬ 
tation  of  the  emunerate  operator  is  needed  for  different 
set  representations  of  5,  P,  C,  and  P. 

A  hypothesis  is  induced  by  selecting  a  single  hypoth¬ 
esis  from  the  solution  set.  This  is  done  with  a  call  to 
Enumerate({H,CyP),l).  The  emptiness  and  unique¬ 
ness  queries  are  implemented  as  shown  below,  where 
5  is  the  solution  set  of  tuple  (P,  C,  P),  A  is  set  of  hy¬ 
potheses  in  P,  and  his  a  hypothesis  in  P. 

•  Empty{S)  Enumerate{{H,C,P),l)  =  0 

•  Unique{S)  \Enumerate{{H^C,P)^2)\  =  1 

•  Member{h,  S)  ^  Enumerate{{H^  C,  P)n{/i},  1)  7*^  0 

•  Subset{S,  A)  Enumerate{{H,  C,  P)nA,  1)  =  0 

An  Example  Induction  Task 

An  example  of  how  KII  can  solve  a  simple  induction 
task  is  given  below.  Sets  have  been  represented  exten- 
sionally  in  this  example.  Although  this  is  not  the  only 
possible  set  representation,  and  is  generally  a  poor  one, 
it  is  the  simplest  one  for  illustrative  purposes. 

The  Hypothesis  Space  The  target  concept  is  a 
member  of  a  hypothesis  space  in  which  hypotheses 
are  described  by  conjunctive  feature  vectors.  There 
are  three  features  size,  color,  and  shape.  The  val¬ 
ues  for  these  features  are  size  G  {small,  large, 
any-size},  color  €  {black,  white,  any-color},  and 


•  TranPosExample(H,  {z,  c,s))  {C,  {})  where 

C  =  {x  £  H  \  x  covers  (z,  c,  $)} 

=  {z,  any- size}  x  {c,  any-color}  x  {s,  any- shape} 

•  TranNegExample(Hy  (z,  c,  s})  — >•  {C,  {}}  where 
C  =  {x  £  H  \  x  does  not  cover  (z,c,  s)} 

=  complement  of 

{z,  any-size}x{c,  any-color}  x{s,  any-shape} 

•  TranPreferGeneral(H)  — >•  {H^P)  where 

P  =  {(x,y)  £  HxH  \  X  is  more  specific  than  y} 

=  {{sbr,  Ibr)^  {$br,  -s?r),  {sbr,  ??r),  (szwr,  ?wr), . . .} 

Figure  1:  Translators. 


shape  £  {circle,  rectangle,  any-shape}.  Hypothe¬ 
ses  are  described  as  3-tuples  from  size x color x  shape. 
For  shorthand  identification,  a  value  is  specified  by  the 
first  character  of  its  name,  except  for  the  any  \^ues 
which  are  represented  by  a  So  the  hypothesis 
(any-size,  white,  circle)  would  be  written  as  ?wc. 

Instances  are  the  “ground”  hypotheses.  An  in¬ 
stance  is  a  tuple  {size ^  color ^  shape)  where  color  £ 
{black,  white},  size  £  {small,  large},  and  shape  £ 
{circle,  rectangle}. 

Available  Knowledge  The  available  knowledge 
consists  of  three  examples  (classified  instances),  and 
an  assumption  that  accuracy  increases  with  generality. 
There  are  three  examples,  two  positive  and  one  neg¬ 
ative.  The  two  positive  examples  are  ei  =  swc  and 
e2  =  sbc.  The  negative  example  is  63  =  Iwr.  The  tar¬ 
get  concept  is  s??.  That  is,  size  =  small,  and  color 
and  shape  are  irrelevant. 

Translators  The  first  step  is  to  translate  the  knowl¬ 
edge  into  constraints  and  preferences.  Three  trans¬ 
lators  are  constructed,  one  for  each  type  of  knowl¬ 
edge:  the  positive  examples,  negative  examples,  and 
the  generality  preference.  These  translators  are  shown 
in  Figure  1.  Since  the  hypothesis  space  is  understood, 
(ff,  C,  P)  tuples  will  generally  be  referred  to  as  just 
(C,  P)  tuples  for  the  remainder  of  this  illustration. 

The  examples  are  translated  in  this  scenario  under 
the  assumption  that  they  are  correct;  that  is,  the  tar¬ 
get  concept  covers  all  of  the  positive  examples  and 
none  of  the  negatives.  Positive  examples  are  trans¬ 
lated  as  constraints  satisfied  only  by  hypotheses  that 
cover  the  example.  Negative  examples  are  translated 
similarly,  except  that  hypotheses  must  not  cover  the 
example.  The  bias  for  general  hypotheses  is  translated 
into  a  (C,  P)  pair  where  C  is  P  (it  rejects  nothing), 
and  P  =  {{x,y)  £  HxH  \  X  is  more  specific  than  y}. 
Hypothesis  x  is  more  specific  than  hypothesis  y  ii  x 
is  equivalent  to  y,  except  that  some  of  the  values  in 
y  have  been  replaced  by  “any”  values.  For  example, 
$wr  is  more  specific  than  Iwr ^  but  there  is  no  ordering 
between  Iwc  and  swr. 


Integration  and  Induction  Examples  ei  and  62 
are  translated  by  TranPosExample  into  (P,  Ci ,  0)  and 
(ff,C2,0),  respectively.  Example  63  is  translated  by 
TranNegExample  into  (P,C3,0).  The  preference  for 
general  hypotheses  is  translated  into  (i?,  P,  P4}.  These 
tuples  are  integrated  into  a  single  tuple,  (P,  C,  P)  = 
(P",  Cl  0(^2003 nP',0U0U0UP4).  This  tuple  represents 
the  combined  biases  of  the  four  knowledge  fragments, 
A  hypothesis  is  induced  by  selecting  one  arbitrarily 
from  the  solution  set  of  (P,  C,  P) .  This  is  accomplished 
by  calling  Enumerate{{H,C,P),\),  The  solution  set 
consists  of  the  undominated  elements  of  C  with  re¬ 
spect  to  the  dominance  relation  P.  C  contains  three 
elements,  s??,  ??c  and  sic,  P  prefers  both  s??  and  ??c 
to  s?c,  but  there  is  no  preference  ordering  between  s?? 
and  ??c.  The  undominated  elements  of  C  are  therefore 
s??  and  ??c.  One  of  these  is  selected  arbitrarily  as  the 
induced  hypothesis. 

Instantiating  KII 

In  order  to  implement  KII,  specific  set  representations 
for  P,  C,  and  P  are  necessary.  These  representations 
can  be  as  simple  as  an  extensional  set,  or  as  powerful  as 
arbitrary  Turing  machines.  However,  some  representa¬ 
tion  is  needed.  The  representation  determines  which 
knowledge  can  be  expressed  in  terms  of  (P,  C,  P)  tu¬ 
ples  and  integrated.  It  also  determines  the  computa¬ 
tional  complexity  of  the  integration  and  enumeration 
operations,  which  are  defined  in  terms  of  set  opera¬ 
tions.  By  instantiating  KII  with  different  set  represen¬ 
tations,  algorithms  can  be  generated  at  different  trade¬ 
off  points  between  cost  and  expressiveness. 

The  space  of  possible  set  representations  maps  onto 
the  space  of  grammars.  Every  computable  set  is  the 
language  of  some  grammar.  Similarly,  every  com¬ 
putable  set  representation  is  equivalent  to  some  class  of 
grammars.  These  classes  include,  but  are  not  limited 
to,  the  classes  of  the  Chomsky  hierarchy  (Chomsky 
1959) — regular,  context  free,  context  sensitive,  and  re¬ 
cursively  enumerable  (r.e.).  The  complexity  of  set  op¬ 
erations  generally  increases  with  the  expressiveness  of 
the  language  class. 

Allowing  P,  C,  and  P  to  be  recursively  enumerable 
(i.e.,  arbitrary  Turing  machines),  would  certainly  pro¬ 
vide  the  most  expressiveness.  Although  (P,  C,  P)  tu¬ 
ples  with  r.e.  sets  can  be  expressed  and  integrated,  the 
solution  sets  of  some  such  tuples  are  uncomputable, 
and  there  is  no  way  to  know  which  tuples  have  this 
property.  This  will  be  discussed  in  more  detail  be¬ 
low.  Since  it  is  impossible  to  enumerate  even  a  single 
element  of  an  uncomputable  set,  it  is  impossible  to  in¬ 
duce  a  hypothesis  by  selecting  one  from  the  solution 
set.  There  is  clearly  a  practical  upper  limit  on  the 
expressiveness  of  the  set  representations. 

It  is  possible  to  establish  the  most  expressive  lan¬ 
guages  for  C  and  P  that  guarantee  a  computable  so¬ 
lution  set.  This  establishes  a  practical  limit  on  the 
knowledge  that  can  be  integrated  into  induction. 


By  definition,  the  solution  set  is  computable  if  and 
only  if  it  is  recursively  enumerable.  The  solution  set 
can  always  be  constructed  by  applying  a  formula  of  set 
operations  to  C  and  P,  as  will  be  shown  below.  The 
most  restrictive  language  in  which  the  solution  set  can 
be  expressed  can  be  derived  from  this  formula  and  the 
set  representations  for  C  and  P  by  using  the  the  clo¬ 
sure  properties  of  these  set  operations.  Inverting  this 
function  yields  the  most  expressive  C  and  P  represen¬ 
tations  for  which  the  solution  set  is  guaranteed  to  be 
at  most  recursively  enumerable. 

The  solution  set  can  be  computed  from  C  and  P 
according  to  the  equation  first{{CxC)r\P)r\C,  The 
derivation  is  shown  in  Equation  2,  below.  In  this  defini¬ 
tion,  the  function  fiTst{{{xi,yi),  (0:2, t/2>,  •  * »})  is  a  pro¬ 
jection  returning  the  set  of  tuple  first-elements,  namely 

SolnSet{{H,  C,  P))  =  {xeC\  {x,  y)  ^  P) 

=  {x  €  P  I  (x  €  C  and  3y^c{x^y)  G  P)  or  x  ^  C} 
=  {x  G  P  I  X  G  C  and  3y^c{x,y)  G  P}  U  C 
=  first{{{x,y)  G  CxC  I  {x,y>  G  P})nC 
=  first{{CxC)r\P)r\C  (2) 

The  least  expressive  representation  in  which  the  so¬ 
lution  set  can  be  represented  can  be  computed  from 
the  closure  of  the  above  equation  over  the  C  and 
P  set  representations.  To  do  this,  it  helps  to  know 
the  closure  properties  for  the  individual  set  operations 
in  the  equation:  intersection,  complement,  Cartesian 
product,  and  projection  (first).  The  closure  proper¬ 
ties  of  intersection  and  complement  axe  well  known  for 
most  language  classes,  although  it  is  an  open  problem 
whether  the  context  sensitive  languages  are  closed  un¬ 
der  complementation  (Hopcroft  &  Ullman  1979).  The 
closure  properties  of  projection  and  Cartesian  prod¬ 
uct  are  not  known  as  such,  but  these  operations  map 
onto  other  operations  for  which  closure  properties  aie 
known. 

The  Cartesian  product  of  two  grammars,  AxB,  can 
be  represented  by  their  concatenation,  AB,  The  tuple 
{x,y)  is  represented  by  the  string  xy.  The  Cartesian 
product  can  also  be  represented  by  interleaving  the 
strings  in  A  and  B  so  that  (x,  y)  is  represented  by  a 
string  in  which  the  symbols  in  x  and  y  alternate.  Inter¬ 
leaving  can  sometimes  represent  subsets  of  Ax B  that 
concatenation  cannot,  depending  on  the  language  in 
which  the  product  is  expressed.  The  closure  proper¬ 
ties  of  languages  under  Cartesian  product  depends  on 
which  approach  is  used.  The  following  discussion  de¬ 
rives  limits  on  the  languages  for  CxC  and  P.  When 
the  language  for  C  is  closed  under  Cartesian  product, 
then  the  limits  on  CxC  also  apply  to  C,  since  both 
can  be  expressed  in  the  same  language.  Otherwise, 
the  limits  on  C  have  to  be  derived  from  those  on  CxC 
using  the  closure  properties  of  the  given  implementa¬ 
tion  of  Cartesian  product.  However,  when  C  is  not 


closed  under  Cartesian  product,  the  language  for  C  is 
necessarily  less  expressive  than  that  for  CxC.  The  ex¬ 
pressiveness  limits  on  CxC  therefore  provide  a  good 
upper  bound  on  the  expressiveness  of  C  that  is  inde¬ 
pendent  of  the  Cartesian  product  implementation. 

Regardless  of  the  representation  used  for  Cartesian 
product,  projection  can  be  implemented  as  a  homo¬ 
morphism  (Hopcroft  Sz  Ullman  1979),  which  is  a  map¬ 
ping  from  symbols  in  one  alphabet  to  strings  in  an¬ 
other.  Homomorphisms  can  be  used  to  erase  symbols 
from  strings  in  a  language,  which  is  exactly  what  pro¬ 
jection  does — it  erases  symbols  from  the  second  field  of 
a  tuple,  leaving  only  the  symbols  from  the  first  field. 
A  more  detailed  derivation  of  the  properties  for  pro¬ 
jection  and  Cartesian  product  can  be  found  in  (Smith 
1995). 

The  closure  properties  of  languages  under  projec¬ 
tion,  intersection,  intersection  with  a  regular  grammar, 
and  complement  are  summarized  in  Table  1.  It  should 
be  clear  that  the  solution  set,  ^rsi((CxC)nP)nC,  is 
r.e.  when  (CxC)nP  is  at  most  context  free,  and  un- 
computable  when  it  is  any  more  expressive  than  that. 
For  example,  if  (CxC)nP  is  context  sensitive,  then 
first{{CxC)C\P  is  r.e.  The  complement  of  a  set  that 
is  r.e.  but  not  recursive  is  uncomputable  (Hopcroft  & 
Ullman  1979),  so  the  solution  set,  first{{CxC)r\P^  is 
imcomputable.  A  complete  proof  appears  in  (Smith 
1995). 

There  are  several  ways  to  select  C,  P,  and  the  imple¬ 
mentation  of  Cartesian  product,  such  that  (C7xC)nP 
is  at  most  context  free.  The  expressiveness  of  both  C 
and  P  can  be  maximized  by  choosing  one  of  C  and 
P  to  be  at  most  regular,  and  the  other  to  be  at  most 
context  free.  This  is  because  CFLs  are  closed  under  in¬ 
tersection  with  regular  sets,  but  not  with  other  CFLs. 
Regular  sets  are  closed  under  all  implementations  of 
Cartesian  product  (both  concatenation  and  arbitrary 
interleaving),  and  context  free  sets  are  closed  under 
concatenation  but  only  some  interleavings.  So  if  C  is 
regular,  any  implementation  of  Cartesian  product  can 
be  used,  but  if  C  is  context  free,  then  the  choices  are 
more  restricted. 

As  a  practical  matter,  C  should  be  closed  under  in¬ 
tersection  and  P  under  union  in  order  to  support  the 
integration  operator.  This  effectively  restricts  C  to  be 
regular  and  P  to  be  at  most  context  free.  This  also 
maximizes  the  choices  of  the  Cartesian  product  imple¬ 
mentation.  However,  it  is  possible  for  C  to  be  context 
free  and  P  to  be  regular  if  the  C  set  of  at  most  one 
of  the  (P,  C,  P)  triples  being  integrated  is  context  free 
and  the  rest  are  regular.  This  follows  from  the  clo¬ 
sure  of  context  free  languages  under  intersection  with 
regular  grammars. 

Other  ways  of  selecting  C  and  P  are  summarized 
in  Table  2.  This  table  assumes  that  C  is  closed  un¬ 
der  Cartesian  product.  As  one  interesting  case,  if  the 
representation  for  P  can  express  only  the  empty  set, 
then  the  solution  set  is  just  C,  so  C  can  be  r.e.  The 
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Language  | 

Regular 

DCFL 
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r.e. 
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v/ 

V 

complement 

</ 

V 
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projection 

(homomorphisms) 

s/ 

v/ 

Table  1:  Closure  Under  Operations  Needed  to  Compute  the  Solution  Set. 


restriction  that  {CxC)nP  be  at  most  context  free  is 
still  satisfied,  since  (CxC)nP  is  always  the  empty  set, 
and  therefore  well  within  the  context  free  languages, 

RS-KII 

Instantiating  KII  with  different  set  representations 
produces  algorithms  with  different  computational  com¬ 
plexities  and  abilities  to  utilize  knowledge.  One  instan¬ 
tiation  that  seems  to  strike  a  good  balance  between 
computational  cost  and  expressiveness  represents  if, 
C,  and  P  as  regular  sets.  This  instantiation  is  called 
RS-KII. 

RS-KII  is  a  good  multi-strategy  algorithm,  in  that  it 
can  utilize  various  knowledge  and  strategies,  depending 
on  what  knowledge  is  integrated,  and  how  it  is  trans¬ 
lated.  Existing  algorithms  can  be  emulated  by  creat¬ 
ing  translators  for  the  knowledge  and  strategies  of  that 
algorithm,  and  integrating  the  resulting  (P,  C,  P)  tu¬ 
ples.  Hybrid  multi-strategy  algorithms  can  be  created 
by  translating  and  integrating  additional  knowledge, 
or  by  integrating  novel  combinations  of  knowledge  for 
which  translators  already  exist. 

Creating  algorithms  by  writing  translators  for  indi¬ 
vidual  knowledge  fragments  and  integrating  them  to¬ 
gether  can  be  easier  than  writing  new  induction  al¬ 
gorithms.  Algorithms  can  be  constructed  modularly 
from  translators,  which  allows  knowledge  fragments  to 
be  easily  added  or  removed.  By  contrast,  modifications 
made  to  an  algorithm  in  order  to  utilize  one  knowledge 
fragment  may  have  to  be  discarded  in  order  to  utilize 
a  second  fragment. 

The  remainder  of  this  section  demonstrates  how  RS- 
KII  can  emulate  AQ-11  with  a  beam  width  of  one 
(Michalski  1978),  and  how  RS-KII  can  integrate  addi¬ 
tional  knowledge,  namely  an  overgeneral  domain  the¬ 
ory  and  noisy  examples,  to  create  a  hybrid  algorithm. 
AQ-11  with  higher  order  beam  widths  is  not  demon¬ 
strated,  since  it  is  not  clear  how  to  express  the  corre¬ 
sponding  bias  as  a  regular  grammar.  This  bias  may 
require  a  more  powerful  set  representation. 

When  using  only  the  AQ-11  knowledge,  RS-KII 
induces  the  same  hypotheses  as  AQ-11,  albeit  at  a 
slightly  worse  computational  complexity.  When  utiliz¬ 
ing  the  additional  knowledge,  RS-KII  induces  a  more 
accurate  hypothesis  than  AQ-11,  and  does  so  more 
quickly. 


RS-KII  translators  can  be  written  for  other  knowl¬ 
edge  as  well,  though  space  restrictions  prevent  any  de¬ 
tailed  discussion.  Of  note,  RS-KII  translators  can  be 
constructed  for  all  biases  expressible  as  version  spaces 
(for  certain  classes  of  hypothesis  spaces)  (Smith  1995). 
It  also  looks  likely  that  RS-KII  translators  can  be  con¬ 
structed  for  the  Imowledge  used  by  other  induction  al¬ 
gorithms,  though  this  is  an  area  for  future  research. 

Translators  for  AQ-11  Biases 

The  biases  used  by  AQ-11  are  strict  consistency  with 
the  examples,  and  an  user-defined  lexicographic  evalu¬ 
ation  function  (LEF).  The  LEF  totally  orders  the  hy¬ 
potheses  according  to  user-defined  criteria.  The  in¬ 
duced  hypothesis  is  one  that  is  consistent  with  all  of 
the  examples,  and  is  a  (possibly  local)  maximum  of  the 
LEF.  A  translator  is  demonstrated  in  which  the  LEF  is 
an  information  gain  metric,  as  used  in  algorithms  such 
as  ID3  (Quinlan  1986). 

Hypotheses  are  sentences  in  the  VLi  language 
(Michalski  1974).  There  are  k  features,  denoted  fi 
through  fk ,  where  feature  fi  can  take  values  from  the 
set  Vi.  A  hypothesis  is  a  disjunction  of  terms,  a  term 
is  a  conjunction  of  selectors,  and  a  selector  is  of  the 
form  [fi  rel  Vi],  where  Vi  is  in  Vi  and  rel  is  a  relation 
hi  >,>}.  A  specific  h3qDOthesis  space  in 

VLi  is  specified  by  the  Ust  of  features  and  their  values, 
and  is  denoted  VLi((/i,  Fi), . . . ,  (A,  F^)). 

An  instance  is  a  vector  of  k  values,  {xi,X2, . . , , xjb), 
where  Xi  is  a  value  in  Vi.  A  selector  [fi  relvi]  is  satis¬ 
fied  by  an  example  if  and  only  if  Xi  rel  Vi.  A  hypothesis 
covers  an  example  if  the  example  satisfies  the  hypoth¬ 
esis. 

Strict  Consistency  with  Examples  A  bias  for 
strict  consistency  with  a  positive  example  can  be  ex¬ 
pressed  as  a  constraint  that  the  induced  hypothesis 
must  cover  the  example.  Similarly,  strict  consistency 
with  a  negative  example  constrains  the  induced  hy¬ 
pothesis  not  to  cover  the  example.  Each  of  these  con¬ 
straints  is  expressed  as  a  regular  grammar  that  only 
recognizes  hypotheses  that  satisfy  the  constraint.  The 
regular  expression  for  the  set  of  VLi  hypotheses  cov¬ 
ering  an  example,  Covers  (i?,e)  is  shown  in  Figure  2. 
The  sets  of  values  in  covering-selector  are  all  reg¬ 
ular  sets.  For  example,  the  set  of  integers  less  than 
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Table  2:  Summary  of  Expressiveness  Bounds. 
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Figure  3:  Example  Translators  for  VLi . 


100  is  (0  —  9)|((1  -  9)(0  “  9)).  There  is  an  algorithm 
that  generates  each  of  these  sets  given  the  relation  and 
the  bounding  number,  but  it  is  omitted  for  brevity. 
The  complement  of  Covers  e)  is  Excludes  e),  the 
set  of  hypotheses  in  H  that  do  not  cover  example  e. 
These  two  regular  grammars  implement  the  translators 
for  positive  and  examples  in  the  VLi  hypothesis  space 
language,  as  shown  in  Figure  3.  The  translator  takes 
as  input  the  list  of  features  and  their  values,  and  the 
example. 

The  LEF  AQ-11  performs  a  beam  search  of  the  hy¬ 
pothesis  space  to  find  a  hypothesis  that  maximizes  the 
LEF,  or  is  at  least  a  good  local  approximation.  AQ-11 
returns  the  first  hypothesis  visited  by  this  search  that 
is  also  consistent  with  the  examples.  This  is  a  bias  to¬ 
wards  hypotheses  that  come  earUer  in  the  search  order. 

This  bias  can  be  expressed  as  an  {H^  C,  P)  tuple  in 
which  C  =  H  (i.e.,  no  hypotheses  are  rejected),  and 
P  is  a  partial  ordering  over  the  hypothesis  space  in 
which  (a,  b)  is  in  P  if  and  only  if  hypothesis  a  comes 
after  hypothesis  6  in  the  search  order  (i.e.,  a  is  less 
preferred  than  b). 

The  search  order  of  a  beam  search  is  difficult,  and 
perhaps  impossible,  to  express  as  a  regular  grammar. 
However,  with  a  beam  width  of  one,  beam  search  be¬ 
comes  hill  climbing,  which  can  be  expressed  as  a  regu¬ 
lar  grammar. 

In  hill  climbing,  single  selector  extensions  of  the  cur¬ 


rent  best  hypothesis  are  evaluated  by  some  evaluation 
function,  /,  and  the  extension  with  the  best  evaluation 
becomes  the  next  current  best  hypothesis.  Given  two 
terms,  ti  =  aia2  . . .  Un  and  t2  =  6162  .. .  6m,  where  ai 
and  bi  are  selectors,  ti  is  visited  before  ^2  if  the  first 
fe  —  1  extensions  of  ti  and  t2  are  the  same,  but  on  the 
extension,  either  ti  has  a  better  evaluation  than  ^2, 
or  ti  has  no  more  selectors.  Formally,  there  is  either 
some  extension  k  <  min(m,n)  such  that  for  all  2  <  fc, 
ai  =  bi  and  f{ai . . .  a^)  >  /(61 . . .  bk),  or  m  <  n  and 
the  first  m  selectors  of  ti  and  <2  are  the  same. 

This  is  equivalent  to  saying  that  the  digit  string 
fiO'i)  *  •  . . .  *  /(aia2  . .  .Uti)  comes  before  the 

digit  string  /(61)  *  /(6162)  •  . . .  •  /(6162  . . .  6^)  in  dic¬ 
tionary  (lexicographic)  order.  This  assumes  that  low 
evaluations  are  best,  and  that  the  evaluation  func¬ 
tion  returns  a  unique  value  for  each  term — ^that  is, 
f{ai . . .  am)  =  /(61 . . ,  6m)  if  and  only  if  =  6^  for 
all  i  between  one  and  m.  This  can  be  ensured  by  as¬ 
signing  a  unique  id  to  each  selector,  and  appending 
the  id  for  the  last  selector  in  the  term  to  the  end  of  the 
term’s  evaluation.  The  evaluations  of  two  terms  are 
compared  after  each  extension  until  one  partial  term 
either  has  a  better  evaluation,  or  terminates. 

A  regular  grammar  can  be  constructed  that  recog¬ 
nizes  pairs  of  hypotheses,  {hi,h2),  if  hi  is  visited  be¬ 
fore  /i2  in  the  search.  This  is  done  in  two  steps.  First, 
a  grammar  is  constructed  that  maps  each  hypothesis 
onto  digit  strings  of  the  kind  described  above.  The 
digit  strings  are  then  passed  to  a  regular  grammar  that 
recognizes  pairs  of  digit  strings,  (^1,^2),  such  that  di 
comes  before  d2  in  dictionary  order.  This  is  equiv¬ 
alent  to  substituting  the  mapping  grammar  into  the 
dictionary  ordering  grammar.  Since  regular  grammar 
are  closed  under  substitution,  the  resulting  grammar 
is  also  regular  (Hopcroft  &  Ullman  1979). 

The  digit  string  comparison  grammar  is  the  simpler 
of  the  two,  so  it  will  be  described  first.  This  gram¬ 
mar  recognizes  pairs  of  digit  strings,  (x,y),  such  that 
X  comes  before  y  lexicographically.  A  special  termina¬ 
tion  symbol,  #,  is  appended  to  each  string,  and  the  re¬ 
sulting  strings  are  interleaved  so  that  their  symbols  al¬ 
ternate.  The  interleaved  string  is  given  as  input  to  the 
grammar  specified  by  the  regular  expression  equal* 
LESS- THAN  ANY*,  where  EQUAL  =  (00|11|##),  LESS- 
THAN  =  (01|#0|#1)  and  ANY  =  (0|1|#).  This  expres¬ 
sion  assumes  a  binary  digit  string,  but  can  be  easily 
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Figure  2:  Regular  Expression  for  the  Set  of  VLi  Hypotheses  Covering  an  Instance. 


extended  to  handle  base  ten  numbers. 

The  mapping  of  a  hypothesis  onto  a  digit  string  is 
accomplished  by  a  Moore  machine — a  DFA  that  has 
an  output  string  associated  with  each  state.  Recall 
that  the  digit  string  for  a  term,  aia2  . . .  Om,  is  /(ui)  • 
/(aia2)  •  •  /(aia2  . . .Um)).  The  machine  takes  a 

hypothesis  as  input.  After  reading  each  selector,  it 
outputs  the  evaluation  string  for  the  current  partial 
term.  So  after  seeing  ai,  it  prints  /(ai).  After  seeing 
02  it  prints  /(aia2),  and  so  on  until  it  has  printed  the 
digit  string  for  the  term.  When  the  end  of  the  term 
is  encountered  (i.e.,  an  or  symbol  is  seen),  the  DFA 
returns  to  the  initial  state  and  repeats  the  process  for 
the  next  term.  The  evaluation  function  must  return  a 
fixed-length  string  of  digits. 

A  Moore  machine  can  only  have  a  finite  number  of 
states.  It  needs  at  least  one  state  for  each  selector. 
It  must  also  remember  enough  about  the  previous  se¬ 
lectors  in  the  term  to  compute  the  term’s  evaluation. 
Since  terms  can  be  arbitrarily  long,  no  finite  state  ma¬ 
chine  can  remember  all  of  the  previous  selectors  in  the 
term.  However,  the  evaluation  function  can  often  get 
by  with  much  less  information. 

For  example,  when  the  evaluation  function  is  an 
information  metric,  the  evaluation  of  a  partial  term, 
aia2  . . .  afc,  depends  only  on  the  number  of  positive  and 
negative  examples  covered  by  the  term.  This  can  be 
represented  by  2“^  states,  where  n  is  the  number  of  ex¬ 
amples.  In  this  case,  a  state  in  the  Moore  machine 
is  an  n  digit  binary  number,  where  the  digit  in¬ 
dicates  whether  or  not  the  example  is  covered  by  the 
term.  In  the  initial  state,  all  of  the  examples  are  cov¬ 
ered.  When  a  selector  is  seen,  the  digits  corresponding 
to  examples  that  are  not  covered  by  the  selector  are 
turned  off.  The  binary  vector  for  the  state  indicates 
which  examples  are  covered,  and  the  output  string  for 
the  state  is  the  information  corresponding  to  that  cov¬ 


erage  of  the  examples.^  When  an  or  is  seen,  the  DFA 
prints  a  zero  to  indicate  end-of-term,  and  returns  to 
the  initial  state. 

This  Moore  machine  is  parameterized  by  the  list  of 
examples  and  the  evaluation  function  /.  This  machine 
is  substituted  into  the  regular  expression  for  comparing 
digit  strings.  The  resulting  DFA  takes  recognizes  a  pair 
of  hypotheses,  if  and  only  if  hi  comes  before 

/i2  in  the  hill  climbing  search. 

Although  the  machine  has  an  exponential  number 
of  states,  they  do  not  need  to  be  represented  exten- 
sionally.  All  that  must  be  maintained  is  the  current 
state  (an  n  digit  binary  munber).  The  next  state  can 
be  computed  from  the  current  state  and  a  selector  by 
determining  which  examples  are  not  covered  by  the 
selector,  and  turning  off  those  bits.  This  requires  at 
most  0{n)  space  and  0{mn)  time  to  evaluate  a  hy¬ 
pothesis,  where  n  is  the  number  of  examples,  and  m  is 
the  number  of  selectors  in  the  hypothesis. 

The  translator  for  this  knowledge  source  takes  as 
input  the  hypothesis  space,  the  list  of  examples,  and 
an  evaluation  function,  /.  The  function  /  takes  as 
input  the  number  of  covered  and  uncovered  examples, 
and  outputs  a  fixed  length  non-negative  integer.  The 
translator  returns  {H,H,P),  where  P  is  the  grammar 
described  above.  (H,  H,  P)  prefers  hypotheses  that  are 
visited  earlier  by  hill  climbing  with  evaluation  function 
/.  This  kind  of  bias  is  used  in  a  number  of  induction 
algorithms,  so  this  translator  can  be  used  for  them  as 
well. 

Although  the  logic  behind  the  LEF  translator  is 

^  Since  information  is  a  real  between  -1  and  1,  and 
the  output  must  be  a  fixed-length  non-negative  integer, 
the  output  string  for  a  state  is  the  integer  portion  of 
{info  -h  1.0)  *  10^,  where  info  is  the  information  of  the  ex¬ 
ample  partitioning  represented  by  the  n  digit  number  for 
that  state. 


rather  complex,  the  translator  itself  is  fairly  straight¬ 
forward  to  write.  The  Moore  machine  requires  only  a 
handful  of  code  to  implement  the  next-state  and  out¬ 
put  functions,  and  the  digit-string  comparison  gram¬ 
mar  is  a  simple  regular  expression.  The  design  effort 
also  transfers  to  other  biases.  The  evaluation  func¬ 
tion  can  be  changed,  so  long  as  it  only  needs  to  know 
which  examples  are  covered  by  the  current  term,  and 
the  basic  design  can  be  reused  for  translators  of  similar 
biases. 

Some  of  the  difficulty  in  designing  the  LEF  transla¬ 
tor  may  be  because  the  bias  is  designed  for  use  in  a 
h5rpothesis  space  search  paradigm,  and  does  not  trans¬ 
late  well  to  RS-KIL  Bear  in  mind  that  the  beam-search 
is  an  approximation  of  another  bias,  namely  that  the 
induced  hypothesis  should  maximize  the  LEF.  Finding 
a  maximal  hypothesis  is  intractable,  so  AQ-11  approx¬ 
imates  it  with  a  beam  search.  This  particular  approx¬ 
imation  was  chosen  because  it  is  easy  to  implement  in 
the  hypothesis-space  search  paradigm.  However,  RS- 
KII  uses  a  different  paradigm,  so  a  different  approxi¬ 
mation  of  the  “maximize  the  LEF”  bias  that  is  easier 
to  express  in  RS-KII  may  be  more  appropriate. 

Translators  for  Novel  Biases 

The  following  translators  are  for  biases  that  AQ-11 
does  not  utilize,  namely  consistency  with  one  class  of 
noisy  examples,  and  an  assumption  that  the  target  hy¬ 
pothesis  is  a  specialization  of  an  overgeneral  domain 
theory. 

Noisy  Examples  with  Bounded  Inconsistency 
Bounded  inconsistency  (Hirsh  1990)  is  a  kind  of  noise 
in  which  each  feature  of  the  example  can  be  wrong  by 
at  most  a  fixed  amount.  For  example,  if  the  width 
value  for  each  instance  is  measured  by  an  instrument 
with  a  maximum  error  of  ±0.3mm,  then  the  width  val¬ 
ues  for  these  instances  have  bounded  inconsistency. 

The  idea  for  translating  examples  with  bounded  in¬ 
consistency  is  to  use  the  error  margin  to  work  back¬ 
wards  from  the  noisy  example  to  compute  the  set  of 
possible  noise-free  examples.  One  of  these  examples 
is  the  correct  noise-free  version  of  the  observed  exam¬ 
ple,  into  which  noise  was  introduced  to  produce  the 
observed  noisy  example.  The  target  concept  is  strictly 
consistent  with  this  noise-ffee  example. 

Let  e  be  the  noisy  observed  example,  E  be  the  set 
of  noise-free  examples  from  which  e  could  have  been 
generated,  and  let  e'  be  the  correct  noise-free  example 
from  which  e  was  in  fact  generated.  Since  it  is  unknown 
which  example  in  E  is  e',  a  noisy  example  is  translated 
as  (E,  C,  0),  where  C  is  the  set  of  hypotheses  that  are 
strictly  consistent  with  one  or  more  of  the  examples 
in  E,  Hypotheses  that  are  consistent  with  none  of  the 
examples  in  E  are  not  consistent  with  e\  and  therefore 
not  the  target  concept.  This  is  the  approach  used  by 
Hirsh  (Hirsh  1990)  in  IVSM  to  translate  noisy  exam¬ 
ples  with  bounded  inconsistency. 
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Figure  4:  RS-KII  Translator  for  Positive  Examples 
with  Bounded  Inconsistency. 


This  suggests  the  following  RS-KII  translator  for 
examples  with  bounded  inconsistency.  The  set  of 
possible  noise-free  examples,  E,  is  computed  from 
the  noisy  examples  and  the  error  margins  for  each 
feature.  Each  example,  e^,  in  this  set  is  trans¬ 
lated  using  one  of  the  RS-KII  translators  for  noise- 
free  examples — either  TranPosAQExample{H^ei)  or 
TranNegAQExample{H,ei) — which  translates  exam¬ 
ple  Si  into  {H,Ci,0),  Ci  is  the  set  of  hypothe¬ 
ses  that  are  strictly  consistent  with  e^.  The  trans¬ 
lator  for  the  bounded  inconsistent  example  returns 

{C  =  Ci,0).  C  is  the  set  of  hypotheses  consis¬ 

tent  with  at  least  one  of  the  examples  in  E. 

The  set  E  is  computed  from  the  observed  example, 
(ri,  X2, . . . ,  xa;),  and  the  error  margins  for  each  feature, 
±Si  through  ±Sky  as  follows.  If  the  observed  value 
for  feature  fi  is  and  the  error  margin  is  ±6i^  then 
the  correct  value  for  feature  fi  is  in  {u  |  Xj  —  < 

V  ^  Xi  -f  Call  this  set  [xi^±6i]  for  short.  Since 
instances  are  ordered  vectors  of  feature  values,  E  is 
[xi,±<ii]x[x2,±^2]x ...  x[Xfc,±4]. 

A  translator  for  examples  with  bounded  inconsis¬ 
tency  based  on  this  approach  is  shown  in  Figure  4. 
It  takes  as  input  a  VLi  hypothesis  space  (E),  the  er¬ 
ror  margin  for  each  feature  (±<5i  through  ±4)  and  an 
instance.  Negative  examples  are  translated  similarly, 
except  that  TranNegAQExample{H,ei)  is  used. 

Domain  Theory  A  domain  theory  encodes  back¬ 
ground  knowledge  about  the  target  concept  as  a  collec¬ 
tion  of  hom-clause  inference  rules  that  explain  why  an 
instance  is  a  member  of  the  target  concept.  The  way 
in  which  this  knowledge  biases  induction  depends  on 
assumptions  about  the  correctness  and  completeness  of 
the  theory.  Each  of  these  assumptions  requires  a  dif¬ 
ferent  translator,  since  the  biases  map  onto  different 
constraints  and  preferences. 

A  translator  for  a  particular  overgeneral  domain  the¬ 
ory  is  described  below.  The  theory  being  translated  is 
derived  from  the  classic  “cup”  theory  (Mitchell,  Keller, 
&  Kedar-Cabelli  1986;  Winston  ei  al.  1983),  and  is 
shown  in  Figure  5.  It  expands  into  a  set  of  suflSicient 
conditions  for  cup  (X) ,  as  shown  in  Figure  6.  The  trans¬ 
lator  assumes  that  the  target  concept  is  a  specialization 
of  the  theory.  In  this  case,  the  actual  target  concept 
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[open-top  =  true] 


Figure  5:  CUP  Domain  Theory. 


1. 

cup(X) 

plastic (X),  small (X),  cylindrical  (X), 
flat-bottom(X),  open_top(X). 

2. 

cup(X) 

:-  china(X),  small (X),  cylmdrical(X), 
flat-bottom(X),  open-top(X). 

3. 

cup(X) 

metal(X),  small(X),  cylindrical (X), 
flat_bottom(X),  open_top(X). 

4. 

cup(X) 

plastic(X),  small(X),  has_handle(X), 
flat_bottom(X),  open-top(X). 

5, 

cup(X) 

metal(X),  small(X),  hasJiandle(X), 
flat-bottom(X),  open-top(X). 

6. 

cup(X) 

:-  china(X),  small (X),  has_handle(X), 
flat_bottom(X),  open_top(X). 

Figure  6:  SuflScient  Conditions  of  the  CUP  Theory. 


Figure  7:  Selectors  Corresponding  to  Predicates  in  CUP 
Theory. 
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CONDITION  -4 

PLASTIC  (X)  -4 
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cylindrical(x) 

small(x) 

flat_bottom(x)  ^ 

OPEN-TOP(x) _ -4 


TERM  I  C  or  TERM 

CONDITION 

CUP(x) 

[plastic  =  true] 
[china  =  true] 

[metal  =  true] 
[has-handle  =  true] 
[cylindrical  =  true] 
[size  <  5] 

[flat -hot tom  -  true] 
[open-top  =  true] 


is  “plastic  cups  without  handles,”  which  corresponds 
to  condition  one,  but  this  information  is  not  provided 
to  the  translator.  All  the  translator  knows  is  that  the 
target  concept  can  be  described  by  a  disjunction  of  one 
or  more  of  the  sufficient  conditions  in  the  cup  theory. 

The  translator  takes  the  theory  and  hypothesis  space 
as  input,  and  generates  the  tuple  (if,  C,  {}),  where  C  is 
satisfied  by  h3potheses  equivalent  to  a  disjimct  of  one 
or  more  of  the  theory’s  sufficient  conditions.  In  gen¬ 
eral,  the  hypothesis  space  language  may  differ  from 
the  language  of  the  conditions,  making  it  difficult  to 
determine  equivalence.  However,  for  the  VLi  language 
of  AQ-11,  the  languages  are  similar  enough  that  sim¬ 
ple  syntactic  equivalence  will  suffice,  modulo  a  few 
cosmetic  changes.  Specifically,  the  predicates  in  the 
sufficient  conditions  are  replaced  by  corresponding  se¬ 
lectors.  All  disjuncts  of  the  resulting  conditions  are 
VLi  hypotheses.  The  mappings  are  shown  in  Figure  7. 
In  general,  the  predicates  are  Boolean  valued,  and 
are  replaced  by  Boolean  valued  selectors.  To  show 
that  other  mappings  are  also  possible,  the  predicate 
small  (x)  is  replaced  by  the  selector  [size  <  5]. 

The  grammar  for  C  is  essentially  the  grammar  for 
the  cup  theory,  with  a  few  additional  rules.  First, 
the  cup  theory  is  written  as  a  context  free  grammar 
that  generates  the  sufficient  conditions.  If  the  gram¬ 
mar  does  not  have  certain  kinds  of  recursion,  as  is  the 
case  in  the  CUP  theory,  then  it  is  in  fact  a  regular 
grammar.  In  this  case,  the  grammar  for  C  will  also  be 
regular.  Otherwise,  the  grammar  for  C  will  be  context 
free.  This  limits  the  theories  that  can  be  utilized  by 
RS-KII.  However,  RS-KII  could  be  extended  to  utilize 


Figure  8:  Grammar  for  VLi  Hypotheses  Satisfying  the 
CUP  Theory  Bias. 


a  context  free  theory  by  allowing  the  C  set  of  at  most 
one  (if,  C,  P)  tuple  to  be  context  free.  This  would  be 
a  different  instantiation  of  KII,  but  still  within  the  ex¬ 
pressiveness  limits  discussed  in  the  previous  section. 

Once  the  theory  has  been  written  as  a  grammar, 
rewrite  rules  are  added  that  map  each  terminal  pred¬ 
icate  (those  that  appear  in  the  sufficient  conditions) 
onto  the  corresponding  selector (s).  This  grammar  gen¬ 
erates  VLi  hypotheses  equivalent  to  each  of  the  suffi¬ 
cient  conditions.  To  get  all  possible  disjuncts,  rules 
are  added  that  correspond  to  the  regular  expression 
CONDITION  (or  condition)*,  where  condition  is  the 
head  of  the  domain-theory  grammar  described  above. 

The  grammar  for  C  discussed  above  is  shown  in  Fig¬ 
ure  8.  This  grammar  is  a  little  less  general  than  it  could 
be,  since  it  does  not  allow  all  permutations  of  the  se¬ 
lectors  within  each  term.  However,  the  more  general 
grammar  contains  considerably  more  rules,  and  per¬ 
muting  the  selectors  does  not  change  the  semantics 
of  a  hypothesis.  In  the  following  grammar,  the  non¬ 
terminal  CUP(x)  is  the  head  of  the  cup  domain-theory 
grammar,  which  has  the  same  structure  as  the  theory 
shown  in  Figure  5. 

Enumerating  the  Solution  Set 

The  solution  set  is  a  regular  grammar  computed  from 
C  and  P,  as  was  shown  in  Equation  2.  A  regular  gram¬ 
mar  is  equivalent  to  a  deterministic  finite  automaton 


(DFA).  One  straightforward  way  to  enumerate  a  string 
from  the  solution  set  is  to  search  the  DFA  for  a  path 
from  the  start  state  to  an  accept  state.  However,  the 
DFA  computed  by  the  solution-set  equation  from  C 
and  P  can  contain  dead  states,  from  which  there  is  no 
path  to  an  accept  state.  These  dead  states  can  cause 
a  large  amount  of  expensive  backtracking. 

There  is  a  second  approach  that  can  reduce  back¬ 
tracking  by  making  better  use  of  the  dominance  infor¬ 
mation  in  P.  The  solution  set  consists  of  the  undom¬ 
inated  strings  in  C,  where  P  is  the  dominance  rela¬ 
tion.  Strings  in  this  set  can  be  enumerated  by  search¬ 
ing  C  with  branch-and-bound  (Kumar  1992).  The  ba¬ 
sic  branch-and-bound  search  must  be  modified  to  use 
a  partially  ordered  dominance  relation  rather  than  a 
totally  ordered  one,  and  to  return  multiple  solutions 
instead  of  just  one.  These  modifications  are  relatively 
straightforward,  and  are  described  in  (Smith  1995). 

Although  the  worst-case  complexity  of  branch-and- 
bound  is  the  same  as  a  blind  search  of  the  solution-set 
DFA,  the  complexity  of  enumerating  the  first  few  hy¬ 
potheses  with  bran^-and-bound  can  be  significantly 
less.  Since  for  most  applications  only  one  or  two  hy¬ 
potheses  are  ever  needed,  RS-KII  uses  branch-and- 
bound. 

Results 

By  combining  biases,  different  induction  algorithms 
can  be  generated.  AQ-11  uses  the  biases  of  strict  con¬ 
sistency  with  examples,  and  prefers  hypotheses  that 
maximize  the  LEF.  When  using  only  these  biases,  both 
RS-KII  and  AQ-11  with  a  beam  width  of  one  induce 
the  same  hypotheses,  though  RS-KII  is  slightly  more 
computationally  expensive.  The  complexity  of  AQ-11 
with  a  beam-size  of  one  is  0(e^A;),  where  e  is  the  num¬ 
ber  of  examples  and  k  is  the  number  of  features.  The 
complexity  of  RS-KII  when  using  only  AQ-11  biases 
is  O(e^fe^).  These  derivations  can  be  found  in  (Smith 
1995),  and  generally  follow  the  complexity  derivations 
for  AQ-11  in  (Clark  &  Niblett  1989).  RS-KII  is  a  little 
more  costly  because  it  assumes  that  the  LEF  bias,  en¬ 
coded  by  P,  is  a  partial  order,  where  it  is  in  fact  a  total 
order.  This  causes  RS-KII  to  make  unnecessary  com¬ 
parisons  that  AQ-11  avoids.  One  could  imagine  a  ver¬ 
sion  of  RS-KII  which  used  information  about  whether 
P  was  a  total  order  or  a  partial  order. 

RS-KIFs  strength  lies  in  its  ability  to  utilize  addi¬ 
tional  knowledge,  such  as  the  domain  theory  and  noisy 
examples  with  bounded  inconsistency.  When  the  do¬ 
main  theory  translator  is  added,  RS-KIFs  complexity 
drops  considerably,  since  the  hypothesis  space  is  re¬ 
duced  to  a  relative  handful  of  hypotheses  by  the  strong 
bias  of  the  domain  theory.  The  concept  induced  by  RS- 
KII  is  also  more  accurate  than  that  learned  by  AQ-11, 
which  cannot  utilize  the  domain  theory.  When  given 
the  four  examples  of  the  concept  “plastic  cups  with¬ 
out  handles,”  as  shown  in  Table  3,  AQ-11  learns  the 
overgeneral  concept 
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Table  3:  Examples  for  the  CUP  Task. 


[plastic  =  true]  [cylindrical  =  true] 

which  includes  many  non-cups,  whereas  RS-KII  learns 
the  correct  concept: 

[plastic  =  true]  [cylindrical  =  true] 

[size  <  5]  [flat-bottom  =  true] 

[open-top  =  true] 

The  additional  bias  from  the  domain  theory  makes  this 
the  shortest  concept  consistent  with  the  four  examples. 

RS-KII  can  also  handle  noisy  examples  with 
bounded  inconsistency.  For  the  cup  domain,  assume 
that  the  size  can  be  off  by  at  most  one.  Let  the  size 
feature  of  example  62  be  six  instead  of  five.  AQ-11 
would  fail  to  induce  a  hypothesis  at  all,  since  there  is 
no  hypothesis  consistent  with  all  four  examples.  When 
using  the  bounded-inconsistency  translator  for  exam¬ 
ples,  RS-KII  can  induce  a  hypothesis,  namely  the  same 
one  learned  above  with  noise-free  examples.  In  gen¬ 
eral,  noisy  examples  introduce  uncertainty,  which  can 
increase  the  size  of  the  solution  set  and  decrease  the 
accuracy  of  the  learned  hypothesis.  Additional  knowl¬ 
edge  may  be  necessary  to  mitigate  these  effects.  In  this 
case,  however,  the  domain  theory  bias  is  sufficiently 
strong,  and  the  noise  suflSciently  weak,  that  no  addi¬ 
tional  knowledge  is  needed. 

The  ability  to  utilize  additional  knowledge  allows 
RS-KII  to  induce  hypotheses  in  situations  where  AQ- 
11  cannot,  and  allows  RS-KII  to  induce  more  accurate 
hypotheses.  RS-KII  can  also  make  use  of  knowledge 
other  than  those  shown  here  by  writing  appropriate 
translators. 


Precursors  to  KII 

KII  has  its  roots  in  two  knowledge  integration  systems. 
Incremental  Version  Space  Merging  (Hirsh  1990),  and 
Grendel  (Cohen  1992).  These  systems  can  also  be  in¬ 
stantiated  from  KII,  given  appropriate  set  represen¬ 
tations.  These  systems  and  their  relation  to  KII  are 
described  below. 

rVSM.  Incremental  Version  Space  Merging  (IVSM) 
(Hirsh  1990)  wais  one  of  the  first  knowledge  integration 


systems  for  induction,  and  provided  much  of  the  mo¬ 
tivation  for  KIL  IVSM  integrates  knowledge  by  trans¬ 
lating  each  knowledge  fragment  into  a  version  space 
of  hypotheses  consistent  with  the  knowledge,  and  then 
intersecting  these  version  spaces  to  obtain  a  version 
space  consistent  with  all  of  the  knowledge.  Version 
spaces  map  onto  {H,  C,  P)  tuples  in  which  C  is  a  ver¬ 
sion  space  in  the  traditional  [5,  G]  representation,  and 
P  is  the  empty  set  (i.e.,  no  preference  information). 

KII  expands  on  IVSM  by  extending  the  space  of 
set  representations  from  the  traditional  [S,G]  repre¬ 
sentation — and  a  handful  of  alternative  representa¬ 
tions  (e.g.,  (Hirsh  1992;  Smith  &  Rosenbloom  1990; 
Subramanian  &  Feigenbaum  1986)) — to  the  space  of 
all  possible  set  representations.  KII  also  expands  on 
IVSM  by  allowing  knowledge  to  be  expressed  in  terms 
of  preferences  as  well  as  constraints,  thereby  increas¬ 
ing  the  kinds  of  knowledge  that  can  be  utilized.  KII 
strictly  subsumes  IVSM,  in  that  IVSM  can  be  cast  as 
an  instantiation  of  KII  in  which  C  is  a  version  space 
one  of  the  possible  representations,  and  P  is  expressed 
in  the  null  representation,  which  can  only  represent  the 
empty  set. 

GrendeL  Grendel  (Cohen  1992)  is  another  cognitive 
ancestor  of  KII.  The  motivation  for  Grendel  is  to  ex¬ 
press  biases  explicitly  in  order  to  understand  their  ef¬ 
fect  on  induction.  The  biases  are  translated  into  a 
context  free  grammar  representing  the  biased  hypothe¬ 
sis  space.^  This  space  is  then  searched  for  a  hypothesis 
that  is  strictly  consistent  with  the  examples,  under  the 
guidance  of  an  information  gain  metric.  Some  simple 
information  can  also  be  encoded  in  the  grammar. 

Grendel  cannot  easily  integrate  new  knowledge. 
Context  free  grammars  are  not  closed  imder  intersec¬ 
tion  (Hopcroft  &  Ullman  1979),  so  it  is  not  possible 
to  generate  a  grammar  for  the  new  knowledge  and  in¬ 
tersect  it  with  the  existing  grammar.  Instead,  a  new 
grammar  must  be  constructed  for  all  of  the  biases.  KII 
can  use  set  representations  that  are  closed  under  in¬ 
tersection,  which  allows  KII  to  add  or  omit  knowledge 
much  more  flexibly  than  Grendel.  KII  also  has  a  richer 
language  for  expressing  preferences.  Grendel-like  be¬ 
havior  can  be  obtained  by  instantiating  KII  with  a  con¬ 
text  free  grammar  for  C. 

Future  Work 

One  prime  area  for  future  work  is  constructing  RS- 
KII  translators  for  other  biases  and  knowledge  sources, 
especially  those  used  by  other  induction  algorithms. 
This  is  both  to  extend  the  range  of  knowledge  available 
to  RS-KII,  and  to  test  the  limits  of  its  expressiveness 
with  respect  to  existing  algorithms. 

A  second  area  is  investigating  the  naturalness  of 
the  {H,  C,  P)  representation.  In  RS-KII,  some  of  the 

^More  precisely,  they  are  are  expressed  as  an  antecedent 
description  grammar. 


knowledge  in  AQ-11  is  easy  to  express  as  {H,  C,  P)  tu¬ 
ples,  but  some,  such  as  the  LEF,  is  more  awkward. 
Others,  such  as  the  beam  search  bias,  cannot  be  ex¬ 
pressed  at  all  in  RS-KII.  One  approach  is  to  replace 
this  hard-to-express  knowledge  with  knowledge  that 
achieves  similar  effects  on  induction,  but  is  easier  to 
express.  Similar  approaches  are  used  implicitly  in  ex¬ 
isting  algorithms  for  knowledge  that  cannot  be  easily 
used  by  the  search.  For  example,  AQll  approximates 
a  bias  for  the  best  hypothesis  with  a  beam  search  that 
finds  a  locally  maximal  hypothesis. 

Finally,  the  space  of  set  representations  should  be 
investigated  further  to  find  representations  that  will 
yield  other  useful  instantiations  of  KII.  In  particular, 
it  would  be  worth  identifying  a  set  representation  that 
can  integrate  n  knowledge  fragments  and  enumerate 
a  hypothesis  from  the  solution  set  in  time  polynomial 
in  n.  This  would  provide  a  tractable  knowledge  inte¬ 
gration  algorithm.  Additionally,  the  set  representation 
for  the  instantiation  effectively  defines  a  class  of  knowl¬ 
edge  from  which  hypotheses  can  be  induced  in  polyno¬ 
mial  time.  This  would  complement  the  results  in  the 
PAG  literature,  which  deal  with  polynomial-time  learn¬ 
ing  from  examples  only  (e.g.,(Vapnik  &  Chervonenkis 
1971),  (Valiant  1984),  (Blummer  et  al  1989)). 

Conclusions 

Integrating  additional  knowledge  is  one  of  the  most 
powerful  ways  to  increase  the  accuracy  and  reduce  the 
cost  of  induction.  KII  provides  a  uniform  mechanism 
for  doing  so.  KII  also  addresses  an  apparently  inher¬ 
ent  trade-off  between  the  breadth  of  knowledge  utihzed 
and  the  cost  of  induction.  KII  can  vary  the  trade-off 
by  changing  the  set  representation.  RS-KII  is  an  in¬ 
stantiation  of  KII  with  regular  sets  that  shows  promise 
for  being  able  to  integrate  a  wide  range  of  knowledge 
and  related  strategies,  thereby  creating  hybrid  multi¬ 
strategy  algorithms  that  make  better  use  of  the  avail¬ 
able  knowledge.  One  such  hybridization  of  AQ-11  was 
demonstrated.  Other  instantiations  of  KII  may  pro¬ 
vide  similarly  useful  algorithms,  as  demonstrated  by 
IVSM  and  Grendel. 
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